Natural Languages as Collections of Resources *

نویسنده

  • Aarne Ranta
چکیده

We propose a shift in perspective from the view of natural languages as formal languages to natural languages as a collection of resources for constructing local languages for use in particular situations. This is suggested by our experience constructing natural language grammars for particular applications using the Grammatical Framework. It points to a research programme investigating how such resources play a role in linguistic innovation by agents constructing situation-specific local languages and how they can be made dynamic, modified by the linguistic agent’s exposure to innovative linguistic data. 1 Natural languages and formal languages The view of natural languages as formal languages played a significant role in the development of linguistics in the second half of the twentieth century. The view of languages as sets of strings underlay the early development of generative transformational grammar. The following famous quotation from Montague’s (1974) ‘Universal Grammar’ represents a cornerstone of work on formal semantics since the seventies: ∗This work was supported by Vetenskapsr̊adet project 2005-4211 Library-based Grammar Engineering , http://www.cs.chalmers.se/~aarne/GF/doc/vr.html. There is in my opinion no important theoretical difference between natural languages and the artificial languages of logicians; indeed I consider it possible to comprehend the syntax and semantics of both kinds of languages within a single natural and mathematically precise theory. Formal semantics involves the treatment of interpreted formal languages which may be regarded as sets of pairs of string and meanings. Chomsky (1980) and in other work makes clear that he does not regard natural languages as formal languages, or that at least to regard them in terms of string sets is at best missing the point of linguistic theory and at worst dealing with an incoherent notion. In his view, speakers of a natural language have acquired particular individual instantiations of universal principles. It is the nature of the principles and the parameters involved in instantiating them which are of interest for linguistic theory, not the nature of a string set corresponding to external public language, which may not be consistent since each speaker will have acquired their own individual grammar. While the view we propose in this paper is somewhat different to Chomsky’s, we share with him a focus on the formal characterization of the resources available to a speaker of a natural language, rather than a characterization of a set of strings (or string-meaning pairs) which are to be regarded as grammatical in the language. In this paper we will first discuss the advantages and disadvantages of regarding natural languages as formal languages (section 1.1). We will then (section 1.2) propose a view on which natural languages are rather to be regarded as collections of resources, a toolbox which can be used for constructing languages in the formal sense. This view arose from work on the Grammatical Framework, an implemented system for the construction of small application grammars based on general resources for natural languages and we will give a brief characterization of the system in section 2. Finally, we will speculate on how our view could be extended to accommodate linguistic innovation (section 3). 1.1 Are natural languages formal languages? The view of natural languages as formal languages was a tremendously productive abstraction which enabled us to apply twentieth century logical techniques to the characterization of human linguistic ability. Without it we would be able to say very little that is mathematically precise about the structure of language and its interpretation. It has provided great insight into the nature of language and has

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compilation of a Mexican Spanish text corpora

-Collections of texts with syntactic annotation are nowadays useful resources. They are employed for diverse tasks in theoretical research and natural language applications. The most important collections are dedicated to English. But huge efforts have being realized to develop the corresponding to other languages. In this work we present the initial steps for the compilation of a Mexican Spani...

متن کامل

Poor Man’s Word-Segmentation: Unsupervised Morphological Analysis for Indonesian

We present a partially new fully unsupervised algorithm for morphological segmentation of a arbitrary natural language with only one-slot concatenative morphology. The behaviour of the algorithm is examined in detail for Indonesian as it is a good approximation of such a language. The underlying theory makes no assumptions on whether the language is prefixing or suffixing, or whether affixes ar...

متن کامل

Saturnalia: A Latin-Catalan Parallel Corpus for Statistical MT

Currently, a great effort is being carried out in the digitalisation of large historical document collections for preservation purposes. The documents in these collections are usually written in ancient languages, such as Latin or Greek, which limits the access of the general public to their content due to the language barrier. Therefore, digital libraries aim not only at storing raw images of ...

متن کامل

Knowledge-poor Approach to Constructing Word Frequency Lists, with Example from Romance Languages

Word frequency lists extracted from documents are widely used in many procedures of text clustering and categorization. Usually for compilation of such lists morphological-based approaches (such as the Porter stemmer) to join the words having the same base meaning are used. However such an approach needs many language-dependent linguistic resources or knowledge when working with multilingual da...

متن کامل

Pomegranate: Biodiversity and genetic resources, a review

Pomegranate (Punica granatum L.) is a multipurpose plant that is important as nutrition, medical, horticulture, landscape and environment. The plant has been cultivated since old Iran and Egypt which is mentioned in Bible and Quran. Origin of the plant is Iran and some neighbour countries, although it is cultivated in many countries, now a day. Botanically, it is classified in Punicaceae and ha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007